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Abstract. For a static array A of n totally ordered objects, a range minimum query asks for the 
position of the minimum between two specified array indices. We show how to preprocess A into a 
scheme of size 2n + o(n) bits that allows to answer range minimum queries on A in constant time. This 
space is asymptotically optimal in the important setting where access to A is not permitted after the 
preprocessing step. Our scheme can be computed in linear time, using only n + o(n) additional bits for 
construction. We also improve on LCA-computation in BPS- or DFUDS-encoded trees. 

1 Introduction 

For an array A[l, n] of n natural numbers or other objects from a totally ordered universe, a range 
minimum query RMQ^i, j) for i < j returns the position of a minimum element in the sub-array 
j4.[«,j]; i.e., RMQ^^, j) = argmin i<fe<:; {74[/c]}. This fundamental algorithmic problem has numerous 
applications, e.g., in text indexing [1,15,36], text compression [7], document retrieval [31,37,42], 
flowgraphs [19], range queries [40], position-restricted pattern matching [8], just to mention a few. 

In all of these applications, the array A in which the range minimum queries (RMQs) are 
performed is static and known in advance, which is also the scenario considered in this article. In 
this case it makes sense to preprocess A into a (preprocessing-) scheme such that future RMQs can 
be answered quickly. We can hence formulate the following problem. 

Problem 1 (RMQ-Problem). 

Given: a static array yl[l,n] of n totally ordered objects. 

Compute: an (ideally small) data structure, called scheme, that allows to answer RMQs on A in 
constant time. 

The historically first such scheme due to Gabow et al. [16] is based on the following idea: because 
an RMQ-instance can be transformed into an instance of lowest common ancestors (LCAs) in the 
Cartesian Tree [43], one can use any linear-time preprocessing scheme for 0(l)-LCAs [3,5,23,41] 
in order to answer RMQs in constant time. 

The problem of this transformation [16], both in theory and in practice, can be seen by the fol- 
lowing dilemma: storing the Cartesian Tree explicitly (i.e., with labels and pointers) needs 0(n log n) 
bits of space, while storing it succinctly in 2n + o(n) bits [4,30] does not allow to map the array- 
indices to the corresponding nodes (see Sect. 11.11 for more details on why this is difficult). 

A succinct data structure uses space that is close to the information-theoretic lower bound, in 
the sense that objects from a universe of cardinality L are stored in (1 + o(l)) log L bits0 Research 
on succinct data structures is very active, and we just mention some examples from the realm of 
trees [4,9,18,26,30,39], dictionaries [33,34], and strings [10,11,21,22,35,38], being well aware of 
the fact that this list is far from complete. This article presents the first succinct data structure for 
0(l)-RMQs in the standard word-RAM model of computation (which is also the model used in all 
LCA- and RMQ-schemes cited in this article). 



1 Throughout this article, space is measured in bits, and log denotes the binary logarithm. 



Table 1. Preprocessing schemes for 0(l)-RMQs, where \A\ denotes the space for the (read-only) 
input array. 



reference 


final space 


construction space 


comments 


[5,23,41] 


0{n log n) + \ A\ 


0(n log n) + \ A\ 


originally devised for LCA, but solve RMQ via Cartesian Tree 


[3] 


0{n\ogn) + \ A\ 


O(nlogn) + \ A\ 


significantly simpler than previous schemes 


[2] 


0{n\ogn) + \ A\ 


O(nlogn) + \ A\ 


only solution not based on Cartesian Trees 


[13] 


2n + o(n) + \ A\ 


2n + o(n) + \A\ 


generalizes to |n + o(n) + \A\ bits, const, c (see Footnote 2) 


[14] 


0(nH k )+o(n) 


2n + o(n) + \A\ 


Hk is the empirical entropy [28] of A (small if A is compressible) 


[36] 


n + o(n) 


n + o(n) 


only for ±1rmQ; A must be encoded as an n-bit- vector 


[37] 


4n + o(n) 


0(n log n) + \ A\ 


only non-systematic data structure so far 


this article 


2n + o(n) 


3n + o(n) + \A\ 


final space requirement optimal 



Before detailing our contribution, we first classify and summarize existing solutions for 0(1)- 
RMQs. 

1.1 Previous Solutions for RMQ 

In accordance with common nomenclature [17], preprocessing schemes for 0(l)-RMQs can be clas- 
sified into two different types: systematic and non- systematic. Systematic schemes must store the 
input array A verbatim along with the additional information for answering the queries. In such 
a case the query algorithm can consult A when answering the queries; this is indeed what all 
systematic schemes make heavy use of. On the contrary, non-systematic schemes must be able to 
obtain their final answer without consulting the array. This second type is important for at least 
two reasons: 

1. In some applications, e.g., in algorithms for document retrieval [31,37] or position restricted 
substring matching [8], only the position of the minimum matters, but not the value of this 
minimum. In such cases it would be a waste of space (both in theory and in practice) to keep 
the input array in memory, just for obtaining the final answer to the RMQs, as in the case of 
systematic schemes. 

2. If the time to access the elements in A is to (I), this slowed-down access time propagates to 
the time for answering RMQs if the query algorithm consults the input array. As a prominent 
example, in string processing RMQ is often used in conjunction with the array of longest common 
prefixes of lexicographically consecutive suffixes, the so-called LCP-array [27]. However, storing 
the LCP-array efficiently in 2n + o(n) bits [36] increases the access-time to the time needed to 
retrieve an entry from the corresponding suffix array [27], which is i7(log e n) (constant e > 0) 
at the very best if the suffix array is also stored in compressed form [21,35]. Hence, with a 
systematic scheme the time needed for answering RMQs on LCP could never be 0(1) in this 
case. But exactly this would be needed for constant-time navigation in RMQ-based compressed 
suffix trees [15] (where for different reasons the LCP-array is still needed, so this is not the same 
as the above point). 

In the following, we briefly sketch previous solutions for RMQ schemes. For a summary, see Tbl. Q] 
where, besides the final space consumption, in the third column we list the peak space consumption 
at construction time of each scheme, which sometimes differs from the former term. 



Systematic Schemes. Most schemes are based on the Cartesian Tree [43], the only exception 
being the scheme due to Alstrup et al. [2]. All direct schemes [2,3,13,36] are based on the idea of 
splitting the query range into several sub-queries, all of which have been precomputed, and then 
returning the overall minimum as the final result. The schemes from the first three rows of Tbl. Q] 
have the same theoretical guarantees, with Bender et al.'s scheme [3] being less complex than the 
previous ones, and Alstrup et al.'s [2] being even simpler (and most practical). The only 0(ra)-bit 
scheme is due to Fischer and Heun [13] and achieves 2n + o(n) bits of space in addition to the 
space for the input array A. It is based on an "implicit" enumeration of Cartesian Trees only for 
very small blocks (instead of the whole array A). Its further advantage is that it can be adapted 
to achieve entropy-bounds for compressible inputs [14]. For systematic schemes, no lower bound on 
space is known □ 

An important special case is Sadakane's n+o(n)-bit solution [36] for ±1rmq, where it is assumed 
that A has the property that A[i) — A[i — 1] = ±1 for all 1 < i < n, and can hence be encoded 
as a bit- vector ,S[l,n], where a '1' at position i in S indicates that A increases by 1 at position i, 
and a '0' that it decreases. Because we will make use of this scheme in our new algorithm, and also 
improve on its space consumption in Sect. [5j we will describe it in greater detail in Sect. 12.21 

Non-Systematic Schemes. The only existing scheme is due to Sadakane [37] and uses 4n + o(n) 
bits. It is based on the balanced-parentheses-encoding (BPS) [30] of the Cartesian Tree T of the 
input array A and a o(n)-LCA-computation therein [36]. The difficulty that Sadakane overcomes is 
that in the "original" Cartesian Tree, there is no natural mapping between array-indices in A and 
positions of parentheses (basically because there is no way to distinguish between left and right 
nodes in the BPS of T); therefore, Sadakane introduces n "fake" leaves to get such a mapping. 
There are two main drawbacks of this solution. 

1. Due to the introduction of the "fake" leaves, it does not achieve the information-theoretic lower 
bound (for non-systematic schemes) of 2n — <9(logn) bits. This lower bound is easy to see 
because any scheme for RMQs allows to reconstruct the Cartesian Tree by iteratively querying 
the scheme for the minimum (in analogy to the definition of the Cartesian Tree); and because 
the Cartesian Tree is binary and each binary tree is a Cartesian Tree for some input array, any 
scheme must use at least log(( 2 n n _T 1 1 ) /(2n - 1)) = 2n- 6>(logn) bits [25]. 

2. For getting an 0(ra)-time construction algorithm, the (modified) Cartesian Tree needs to be 
first constructed in a pointer-based implementation, and then converted to the space-saving 
BPS. This leads to a construction space requirement of 0(n log n) bits, as each node occupies 
O(logn) bits in memory. The problem why the BPS cannot be constructed directly in O(n) 
time (at least we are not aware of such an algorithm) is that a "local" change in A (be it only 
appending a new element at the end) does not necessarily lead to a "local" change in the tree; 
this is also the intuitive reason why maintaining dynamic Cartesian Trees is difficult [6]. 

1.2 Our Results 

We address the two aforementioned problems of Sadakane's solution [37] and resolve them in the 
following way: 

2 The claimed lower bound of 2n + o(n) + |A| bits under the "min-probe-model" [13] turned out to be wrong, as 
was kindly pointed out to the authors by S. Srinivasa Rao (personal communication, November 2007). In fact, it is 
easy to lower the space consumption of [13] to |n + o(ra) + \A\ bits (constant integer c > 0) by grouping c adjacent 
elements in ^4's blocks together, and "building" the Cartesian Trees only on the minima of these groups. 



1. We introduce a new preprocessing scheme for 0(l)-RMQs that occupies only 2n + o(n) bits 
in memory, thus being the first that asymptotically achieves the information-theoretic lower 
bound for non-systematic schemes. The critical reader might call this "lowering the constants" 
or "micro-optimization," but we believe that data structures using the smallest possible space 
are of high importance, both in theory and in practice. And indeed, there are many examples 
of this in literature: for instance, Munro and Raman [30] give a 2n + o(n)-bit-solution for 
representing ordered trees, while supporting most navigational operations in constant time, 
although a 0(n)-bit-solution (roughly lOn bits [30]) had already been known for some 10 years 
before [25]. Another example comes from compressed text indexing [32], where a lot of effort 
has been put into achieving indexes of size nHk + o(nlogcr) [11], although indexes of size 
0{nHk) + o(n log a) had been known earlier [10,22,35]. (Here, Hk is the /c-th-order empirical 
entropy of the input text T [28] and measures the "compressibility" of T, while a is T's alphabet 
size.) 

2. We give a direct construction algorithm for the above scheme that needs only n + o(n) bits of 
space in addition to the space for the final scheme, thus lowering the construction space for non- 
systematic schemes from 0(n log re) to 0(n) bits (on top of A). This is a significant improvement, 
as the space for storing A is not necessarily (re log re); for example, if the numbers in A are 
integers in the range [1, log ^ n], A can be stored as an array of packed words using only 
O(reloglogre) bits of space. See Sect. [6] for a different example. The construction space is an 
important issue and often limits the practicality of a data structure, especially for large inputs 
(as they arise nowadays in web-page-analysis or computational biology). 

The intuitive explanation why our scheme works better than Sadakane's scheme [37] is that ours 
is based on a new tree in which the preorder-numbers of the nodes correspond to the array-indices 
in A, thereby rendering the introduction of "fake" leaves (as described earlier) unnecessary. In 
summary, this article is devoted to proving 

Theorem 1. For an array A of n objects from a totally ordered universe, there is a preprocessing 
scheme for 0(1) -RMQs on A that occupies only 2n + 0( n ^ 1 ° g n ) bits of memory, while not needing 
access to A after its construction, thus meeting the information-theoretic lower bound. This scheme 
can be constructed in 0(n) time, using only n + o(n) bits of space in addition to the space for the 
input and the final scheme. 

This result is not only appealing in theory, but also important in practice. For example, when 
RMQs are used in conjunction with sequences of DNA (genomic data), where the alphabet size a is 
4, storing the DNA even in uncompressed form takes only 2re bits, already less than the 4n bits of 
Sadakane's solution [37]. Hence, halving the space for RMQs leads to a significant reduction of total 
space. Further, because re is typically very large (n ~ 2 32 for the human genome), a construction 
space of 0(n log n) bits is much higher than the 0(n log a) bits for the DNA itself. An additional 
(practical) advantage of our new scheme is that it also halves the space of the lower order terms 
("o(2ra) vs. o(4n) bits"). This is particularly relevant for realistic problem sizes, where the lower 
order terms dominate the linear term. An implementation in C++ of our new scheme can be down- 
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1.3 Outline 

Sect. [2] presents some basic tools. Sect. [3] introduces the new preprocessing scheme. Sect. H] addresses 
the linear-time construction of the scheme. Sect. [5] lowers the second-order term by giving a new 
data structure for LCA-computation in succinct trees. Sect. [6] shows a concrete example of an 
application where our new preprocessing scheme improves on the total space. 



2 Preliminaries 



This section sketches some known data structures that we are going to make use of. Throughout 
this article, we use the standard word-RAM model of computation, where fundamental arithmetic 
operations on words consisting of O(logn) consecutive bits can be computed in 0(1) time. 

2.1 Rank and Select on Binary Strings 

Consider a bit-string S[l,n] of length re. We define the fundamental rank- and select-operations 
on S as follows: rank\{S,i) gives the number of l's in the prefix 5[l,i], and select\{S,i) gives the 
position of the i'th 1 in S, reading S from left to right (1 < i < n). Operations ranko(S,i) and 
selectors, i) are defined similarly for O-bits. There are data structures of size 0( nl °^° gn ) bits in 
addition to S that support rank- and select-operations in 0(1) time [29]. 

2.2 Data Structures for ±1RMQ 

Consider an array i?[l,n] of natural numbers, where the difference between consecutive elements 
in E is either +1 or —1 (i.e. E[i] — E[i — 1] = ±1 for all 1 < i < re). Such an array E can be 
encoded as a bit-vector 5[l,re], where S[l] = 0, and for i > 1, S[i] = 1 iff E[i] — E[i — 1] = +1. 
Then E[i] can be obtained by E[l\ + ranki(S,i) — rankQ(S,i) + 1 = E[l] + i — 2rankQ(S,i) + 1. 
Under this setting, Sadakane [36] shows how to support RMQs on E in O(l) time, using 5 and 
additional structures of size 0( wl ° 1 g g 1 ° s?1 ) bits. We will improve this space to 0( nl °^g° gn ) in Sect. 
EJ A technical detail is that ±1rmq(z, j) yields the position of the leftmost minimum in E[i,j] if 
there are multiple occurrences of this minimum. 

2.3 Sequences of Balanced Parentheses 

A string B[l,2re] of n opening parentheses '(' and n closing parentheses ')' is called balanced if 
in each prefix 1 < i < 2n, the number of ')'s is no more than the number of '('s. Oper- 

ation findopen(B , i) returns the position j of the "matching" opening parenthesis for the clos- 
ing parenthesis at position j in 5. This position j is defined as the largest j < i for which 
rank((B,i) — rank^(B,i) = rank^(B,j) — rank\(B,j). The /indopen-operation can be computed 

in constant time [30]; the most space-efficient data structure for this needs 0( nl °^ < ^ n ) bits [18]. 

2.4 Depth-First Unary Degree Encoding of Ordered Trees 

The Depth-First Unary Degree Sequence (DFUDS) U of an ordered tree T is defined as follows [4]. 
If T is a leaf, U is given by '()'. Otherwise, if the root of T has uu subtrees T\, . . . , T w in this order, 
U is given by the juxtaposition of w + 1 '('s, a ')', and the DFUDS's of T\, ... ,T W in this order, 
with the first '(' of each being omitted. It is easy to see that the resulting sequence is balanced, 
and that it can be interpreted as a preorder-listing of T"s nodes, where, ignoring the very first '(', 
a node with w children is encoded in unary as l ( w y (hence the name DFUDS). 

3 The New Preprocessing Scheme 

We are now ready to dive into the technical details of our new preprocessing scheme. The basis 
will be a new tree, the 2d-Min-Heap, defined as follows. Recall that ^4[l,n] is the array to be 
preprocessed for RMQs. For technical reasons, we define A[0] = — oo as the "artificial" overall 
minimum. 






1 


2 


3 


4 


5 


6 7 8 


9 


10 11 


12 


13 


14 


15 


16 


-00 


5 


3 


4 


3 


4 


5 13 


2 


4 2 


5 


3 


5 


5 


4 


((((() 


) 


() 


) 


() 





) ('(() ) 


0' 


') (0 


) 


((() 


) 


1 


) 


123454 


3 


43 


2 


3 2 


32 


1 2343_2 


32 


1 232 


1 


234: 


2 


i 






Fig. 1. Top: The 2d-Min-Heap Ma of the input array A. Bottom: Mas DFUDS U and U's excess 
sequence E. Two example queries RMQ^(i, j) are underlined, including their corresponding queries 
±lRMQ E (x,y). 

Definition 1. The 2d-Min-Heap Ma of A is a labeled and ordered tree with vertices vo, . . . ,v n , 
where V{ is labeled with i for all < i < n. For 1 < i < n, the parent node of Vi is vj iff j < i, 
A[j] < A[i], and A[k] > A[i] for all j < k < i. The order of the children is chosen such that their 
labels are increasing from left to right. 

Observe that this is a well-defined tree with the root being always labeled as 0, and that a node 
Vi can be uniquely identified by its label i, which we will do henceforth. See Fig. Q] for an example. 
We note the following useful properties of Ma- 

Lemma 1. Let Ma be the 2d-Min-Heap of A. 

1. The node labels correspond to the preorder-numbers of Ma (counting starts at 0). 

2. Let i be a node in Ma with children x\, . . . , x^. Then A[i] < A[xj] for all 1 < j < k. 

3. Again, let i be a node in Ma with children x\, . . . , x^. Then A[xj] < A[xj-i] for all 1 < j < k. 

Proof. Because the root of Ma is always labeled with and the order of the children is induced by 
their labels, property 1 holds. Property 2 follows immediately from Def. [TJ For property 3, assume 
for the sake of contradiction that A[xj] > A[xj-\] for two children Xj and Xj_i of i. From property 
1, we know that i < xj^i < Xj, contradicting the definition of the parent-child-relationship in Ma, 
which says that A[k] > A[xj] for all * < k < Xj. □ 
Properties 2 and 3 of the above lemma explain the choice of the name "2d-Min-Heap," because 
Ma exhibits a minimum-property on both the parent-child- and the sibling-sibling-relationship, 
i.e., in two dimensions. 

The following lemma will be central for our scheme, as it gives the desired connection of 2d- 
Min-Heaps and RMQs. 

Lemma 2. Let Ma be the 2d-Min-Heap of A. For arbitrary nodes i and j, 1 < i < j < n, let i 
denote the LCA of i and j in Ma (recall that we identify nodes with their labels). Then if I = i, 
rmQa(i, j) is given by i, and otherwise, rmQa(i, j) is given by the child of £ that is on the path 
from £ to j. 

Proof. For an arbitrary node x in Ma, let T x denote the subtree of Ma that is rooted at x. There 
are two cases to prove. 



£ = i. This means that j is a descendant of i. Due to property 1 of Lemma [H this implies that all 
nodes i, i + 1, . . . , j are in Tj, and the recursive application of property 2 implies that A[i] is the 
minimum in the query range 

Let Xi, ... ,xi~ be the children of £. Further, let a and (3 (1 < a < f3 < A;) be defined such 
that T Xa contains i, and contains j. Because £ ^ i and property 1 of Lemma [H we must 
have £ < i; in other words, the LCA is not in the query range. But also due to property 1, every 
node in is in T Xi for some a < 7 < (3, and in particular x 7 £ for all a < 7 < /?. Taking 
this together with property 2, we see that {x 7 : a < 7 < /?} are the only candidate positions 
for the minimum in A[i, j}. Due to property 3, we see that xp (the child of £ on the path to j) 
is the position where the overall minimum in j4[i,j] occurs. □ 

Note that (unlike for ±1rmq) this algorithm yields the rightmost minimum in the query range if 
this is not unique. However, it can be easily arranged to return the leftmost minimum by adapting 
the definition of the 2d-Min-Heap, if this is desired. 

To achieve the optimal 2n + o(n) bits for our scheme, we represent the 2d-Min-Heap Ma by 
its DFUDS U and o(n) structures for ranky, selecty, and ymdopen-operations on U (see Sect. [2]). 
We further need structures for ±lRMQ on the excess-sequence E[l,2n] of U, defined as E[i] = 
rank((U, i) — rank^(U, i). This sequence clearly satisfies the property that subsequent elements differ 
by exactly 1, and is already encoded in the right form (by means of the DFUDS U) for applying 
the ±lRMQ-scheme from Sect. 12.21 

The reasons for preferring the DFUDS over the BPS-representation [30] of A4a are (1) the 
operations needed to perform on Ma are particularly easy on DFUDS (see the next corollary), and 
(2) we have found a fast and space-efficient algorithm for constructing the DFUDS directly (see 
the next section). 

Corollary 1. Given the DFUDS U of Ma, RMQ^(i, j) can be answered in 0(1) time by the fol- 
lowing sequence of operations (1 < i < j < n). 

1. x <— select\{U,i + 1) 

2. y <— select\(U, j) 

3. w <— ±lRMQ E (x,y) 

4- if rankj(U,findopen(U,w)) = i then return i 
5. else return rank^(U,w) 

Proof. Let £ be the true LCA of i and j in Ma- Inspecting the details of how LCA-computation in 
DFUDS is done [26, Lemma 3.2], we see that after the ±lRMQ-call in line 3 of the above algorithm, 
w + 1 contains the starting position in U of the encoding of £'s child that is on the path to j'H Line 
4 checks if £ = i by comparing their preorder-numbers and returns i in that case (case 1 of Lemma 
[2]) — it follows from the description of the parent-operation in the original article on DFUDS [4] 
that this is correct. Finally, in line 5, the preorder-number of £'s child that is on the path to j is 
computed correctly (case 2 of Lemma [2]) . □ 
We have shown these operations so explicitly in order to emphasize the simplicity of our ap- 
proach. Note in particular that not all operations on DFUDS have to be "implemented" for our 
RMQ-scheme, and that we find the correct child of the LCA £ directly, without finding £ explicitly. 
We encourage the reader to work on the examples in Fig. [U where the respective RMQs in both A 
and E are underlined and labeled with the variables from Cor. [H 



3 In line 1, we correct a minor error in the original article [26] by computing the starting position x slightly differently, 
which is necessary in the case that i = LCA(i, j) (confirmed by K. Sadakane, personal communication, May 2008). 



4 Construction of 2d-Min-Heaps 



We now show how to construct the DFUDS U of Ma i n linear time and n + o(n) bits of extra 
space. We first give a general 0(ra)-time algorithm that uses 0(n log n) bits (Sect. I4.ip . and then 
show how to reduce its space to n + o(n) bits, while still having linear running time (Sect. 14. 2|) . 

4.1 The General Linear-Time Algorithm 

We show how to construct U (the DFUDS of Ma) in linear time. The idea is to scan A from right 
to left and build U from right to left, too. Suppose we are currently in step i (n > i > 0), and 
A[i + l,n] have already been scanned. We keep a stack S*[l,/i] (where S[h] is the top) with the 
properties that > • • • > AfS'fl]], and i < S[h] < ■ ■ ■ < S[l] < n. S contains exactly those 

indices j £ [i + l,n] for which A[k] > A[j] for all i < k < j. Initially, both S and U are empty. 
When in step i, we first write a ')' to the current beginning of U, and then pop all w indices from 
5 for which the corresponding entry in A is strictly greater than A[i]. To reflect this change in U, 
we write w opening parentheses '(' to the current beginning of U. Finally, we push i on S and move 
to the next (i.e. preceding) position i — 1. It is easy to see that these changes on S maintain the 
properties of the stack. If % = 0, we write an initial '(' to U and stop the algorithm. 

The correctness of this algorithm follows from the fact that due to the definition of Ma, the 
degree of node i is given by the number w of array-indices to the right of i which have A[i] as their 
closest smaller value (properties 2 and 3 of Lemma [1]). Thus, in U node i is encoded as l ( w )\ which 
is exactly what we do. Because each index is pushed and popped exactly once on/from S, the linear 
running time follows. 

4.2 0(n)-bit Solution 

The only drawback of the above algorithm is that stack S requires O(nlogn) bits in the worst 
case. We solve this problem by representing S as a bit-vector S'[l,n\. S'[i] is 1 if i is on S, and 
otherwise. In order to maintain constant time access to S, we use a standard blocking-technique 
as follows. We logically group s = |~-^fp] consecutive elements of S' into blocks Bn B^n-i j . 

Further, s' = s 2 elements are grouped into super-blocks B' Q , . . . , B' „_i . . 

L s' J 

For each such (super-)block B that contains at least one 1, in a new table M (or M', respectively) 
at position x we store the block number of the leftmost (super-)block to the right of B that contains 
a 1, in M only relative to the beginning of the super-block. These tables need 0(- log(s'/s)) = 
Q( nl ° i ^ g 1 ° gn ) and 0(jr log(n/s)) = 0(^-^) bits of space, respectively. Further, for all possible bit- 
vectors of length s we maintain a table P that stores the position of the leftmost 1 in that vector. 
This table needs 0(2 S • logs) = 0{^/n\og\ogn) = o(n) bits. Next, we show how to use these tables 
for constant-time access to S, and how to keep M and M' up to date. 

When entering step i of the algorithm, we known that S'[i + 1] = 1, because position i + 1 has 
been pushed on S as the last operation of the previous step. Thus, the top of S is given by i + 1. 
For finding the leftmost 1 in S' to the right of j > i (position j has just been popped from S), 
we first check if j's block B x , x = L^j^J ; contains a 1, and if so, find this leftmost 1 by consulting 
P. If B x does not contain a 1, we jump to the next block B y containing a 1 by first jumping to 
y = x + M[x], and if this block does not contain a 1, by further jumping to y = M'[[^-\]. In block 
y, we can again use P to find the leftmost 1. Thus, we can find the new top of S in constant time. 

In order to keep M up to date, we need to handle the operations where (1) elements are pushed 
on S (i.e., a is changed to a 1 in S"), and (2) elements are popped from S (a 1 changed to a 0). 



Because in step % only % is pushed on S, for operation (1) we just need to store the block number 
y of the former top in M[x] (x = L^"J)) if this is in a different block (i.e., if x y). Changes to 
M' are similar. For operation (2), nothing has to be done at all, because even if the popped index 
was the last 1 in its (super-)block, we know that all (super-)blocks to the left of it do not contain 
a 1, so no values in M and M' have to be changed. Note that this only works because elements to 
the right of i will never be pushed again onto S. This completes the description of the n + o(n)-bit 
construction algorithm. 

5 Lowering the Second-Order- Term 

Until now, the second-order-term is dominated by the 0( nlo ^ g ^ gn ) bits from Sadakane's prepro- 
cessing scheme for ±1rmq (Sect. I2.2|) . while all other terms (for rank, select and findopen) are 
Q ( nl °ogn g " )- We snow in tnis section a simple way to lower the space for ±1rmq to 0( nl ° g g° gw ), 
thereby completing the proof of Thm. [U 

As in the original algorithm [36], we divide the input array E into ri = L— J blocks of size 
s = [^rpl • Queries are decomposed into at most three non-overlapping sub-queries, where the first 
and the last sub-queries are inside of the blocks of size s, and the middle one exactly spans over 
blocks. The two queries inside of the blocks are answered by table lookups using 0(y/n log 2 n) bits, 
as in the original algorithm. 

For the queries spanning exactly over blocks of size s, we proceed as follows. Define a new 
array E'[0, n'] such that E'[i] holds the minimum of E's i'th block. E' is represented only implicitly 
by an array E"[0,n'], where E"[i] holds the position of the minimum in the i'th block, relative 
to the beginning of that block. Then E'[i] = E[is + E"[i]]. Because E" stores n/logn numbers 
from the range [1, s], the size for storing E' is thus 0( nl °^g° gra ) bits. Note that unlike E, E' does 
not necessarily fulfill the ±l-property. E' is now preprocessed for constant-time RMQs with the 
systematic scheme of Fischer and Heun [13], using 2n' + o(n') = 0(j^ L ^) bits of space. Thus, by 
querying RMQg/(i, j) for 1 < i < j < n', we can also find the minima for the sub-queries spanning 
exactly over the blocks in E. 

Two comments are in order at this place. First, the used RMQ-scheme [13] does allow the input 
array to be represented implicitly, as in our case. And second, it does not use Sadakane's solution 
for ±1rmq, so there are no circular dependencies. 

As a corollary, this approach also lowers the space for LCA-computation in BPS [36] and 

DFUDS [26] from 0( nl ° 1 f g '° gw ) to 0( nl g° gn ), as these are based on ±1rmq: 

Corollary 2. Given the BPS or DFUDS of an ordered tree T, there is a data structure of size 
0( nl ° i g g 1 ° gn ) bits that allows to answer LCA-queries in T in constant time. 

6 Application in Document Retrieval Systems 

We now sketch a concrete example of where Thm. [1] lowers the construction space of a different 
data structure. This section is meant to show that there are indeed applications where the memory 
bottleneck is the construction space for RMQs. We consider the following problem: 

Problem 2 (Document Listing Problem [31]). 

Given: a collection of k text documents T> = {D%, . . . , Dj,} of total length n. 
Compute: an index that, given a search pattern P of length m, returns all d documents from T) 
that contain P, in time proportional to m and d (in contrast to all occurrences of P in D). 



Sadakane [37, Sect. 4] gives a succinct index for this problem. It uses three parts, for convenience 
listed here together with their final size: 

— Compressed suffix array [35] A of the concatenation of all k documents, \A\ = -Hqu + 0(n) 
bits. 

— Array of document identifiers D, defined by D[i] = j iff the A[i]'th suffix "belongs to" document 
j. Its size is O(Adog^) bits 

— Range minimum queries on an array C, \RMQ\ = 4n + o(n) bits. Here, C stores positions in A 
of nearest previous occurrences of indexed positions from the same document, C[i] = max{j < 
i : D[j] = D[i]}. In the query algorithm, only the positions of the minima matter; hence, this is 
a non-systematic setting. 

Apart from halving the space for RMQ from An to 2n bits, our new scheme also lowers the 
peak space consumption of Sadakane's index for the Document Listing Problem. Let us consider 
the construction time and space for each part in turn: 

— Array A can be built in O(n) time and 0(n) bits (constant alphabet), or 0(n log log \U\) time 
using 0(n log 1 17 1) bits (arbitrary alphabet E) of space [24]. 

— Array D is actually implemented as a fully indexable dictionary [34] called D' , and can certainly 
be built in linear time using 0(n) bits working space, as we can always couple the block- 
encodings [34] with the o(n)-bit structures for uncompressed solutions for rank and select [29]. 

— As already mentioned before, for a fast construction of Sadakane's scheme for 0(l)-RMQs on 
C, we would have needed 6>(nlogn.) bits. Our new method lowers this to 0(n) bits construction 
space. Note that array C needs never be stored plainly during the construction: because C is 
scanned only once when building the DFUDS (Sect. Hj) and is thus accessed only sequentially, 
we only need to store the positions in A of the last seen document identifier for each of the k 
documents. This can be done using a plain array, so \C\ = O(Hogn) bits. 

In summary, we get: 

Theorem 2. The construction space for Sadakane's Index for Document Listing [37] is lowered 
from O(nlogn) bits to 0(n + klogn) bits (constant alphabet) or 0{n log \S\ +k logn) bits (arbitrary 
alphabet £ ) with our scheme for RMQs from Thm. [71 while not increasing the construction time. 

This is especially interesting if k, the number of documents, is not too large, k = 0( n |°^[^ ). 
7 Concluding Remarks 

We have given the first optimal preprocessing scheme for 0(l)-RMQs under the important as- 
sumption that the input array is not available after preprocessing. To the expert, it might come 
as a surprise that our algorithm is not based on the Cartesian Tree, a concept that has proved to 
be very successful in former schemes. Instead, we have introduced a new tree, the 2d-Min-Heap, 
which seems to be better suited for our taskQ We hope to have thereby introduced a new versatile 
data structure to the algorithms community. And indeed, we are already aware of the fact that the 

4 The Cartesian Tree and the 2d-Min-Heap are certainly related, as they are both obtained from the array, and it 
would certainly be possible to derive the 2d-Min-Heap (or a related structure obtained from the natural bijection 
between binary and ordered rooted trees) from the Cartesian Tree, and then convert it to the BPS/DFUDS. But 
see the second point in Sect. 11.21 why this is not a good idea. 



2d-Min-Heap, made public via a preprint of this article [12], is pivotal to a new data structure for 
succinct trees [39]. 

We leave it as an open research problem whether the 3n+o(n)-bit construction space be lowered 
to an optimal 2n + o(n)-bit "in-place" construction algorithm. (A simple example shows that it is 
not possible to use the leading n bits of the DFUDS for the stack.) 

8 Recent Developments 

It has recently been shown [20] that replacing the ±lRMQ-call in Cor. Q] by the range restricted en- 
c/ose-operation is advantegeous in practice, as this latter operation can be implemented by sharing 
the most consuming parts of the data structures with those of the findopen-operation. 
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